Applying Supervised Learning to Real-World Problems

نویسنده

  • Dragos D. Margineantu
چکیده

The last years have seen machine learning methods applied to an increasing variety of application problems such as: language, handwriting and speech processing , document classification, knowledge discovery in databases, industrial process control and diagnosis, fraud and intrusion detection, image analysis and many others. Our work starts from the realization that most of these problems require significant reformulation before learning algorithms can be applied, and in many cases, existing algorithms require modifications before being applied to a problem. The problems mentioned above differ in many aspects but, if subdivided into smaller problems (and this is the approach commonly taken), the sub-problems can often be formulated and approached by employing similar , unified supervised learning techniques. However, this divide-and-conquer process creates dependencies in the data that violate the assumption that the data are independent and identically distributed (iid) and that all errors are of equal cost, issues that involve making changes to existing learning algorithms. The purpose of my thesis is to identify some procedural steps that are shared among the learning approaches to complex real-world problems, and to develop robust general purpose techniques and tools to replace ad-hoc decisions that are currently made during an application effort. The main topics my thesis will emphasize on are described as follows. Learning with misclassification costs. Most classification algorithms assume uniform class distribution and try to minimize the misclassification error. However, many applications require classifiers that minimize an asymmetric loss function rather than the raw misclassification rate. In general, the cost of a wrong prediction depends both on the actual class and on the predicted class. One way to incorporate loss information into classifiers is to alter the priors based on labels of the training examples. For 2-class problems, this can be easily accomplished for any loss matrix, but for k > 2 classes, it is not sufficient. We have studied methods for setting the priors to best approximate an arbitrary k x k loss matrix in decision tree learners. We are currently studying alternative methods to incorporate loss information into decision trees, neural networks and other supervised learning algorithms. Incorporating prior and common sense knowledge and learning with non-independent data examples. Generally, besides the data available, there is simple, common sense knowledge about the data that is not made explicit in the dataset or database. One question is whether this knowledge is useful, and, if so, how it can be represented and …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Real-time Human Interaction with Supervised Learning Algorithms for Music Composition and Performance

This thesis examines machine learning through the lens of human-computer interaction in order to address fundamental questions surrounding the application of machine learning to real-life problems, including: Can we make machine learning algorithms more usable? Can we better understand the real-world consequences of algorithm choices and user interface designs? How can we devise more effective ...

متن کامل

A Comparison of Multi-instance Learning Algorithms

Motivated by various challenging real-world applications, such as drug activity prediction and image retrieval, multi-instance (MI) learning has attracted considerable interest in recent years. Compared with standard supervised learning, the MI learning task is more difficult as the label information of each training example is incomplete. Many MI algorithms have been proposed. Some of them are...

متن کامل

Instance-level Semisupervised Multiple Instance Learning

Multiple instance learning (MIL) is a branch of machine learning that attempts to learn information from bags of instances. Many real-world applications such as localized content-based image retrieval and text categorization can be viewed as MIL problems. In this paper, we propose a new graph-based semi-supervised learning approach for multiple instance learning. By defining an instance-level g...

متن کامل

Elements of Generative Manifold Learning for semi-supervised tasks

For many real-world application problems, the availability of data labels for supervised learning is rather limited. It is often the case that a limited number of labelled cases is accompanied by a larger number of unlabeled ones. This is the setting for semi-supervised learning, in which unsupervised approaches assist the supervised problem and viceversa. In this report, we outline some basic ...

متن کامل

Improving Classification Accuracy of Large Test Sets Using the Ordered Classification Algorithm

We present a new algorithm called Ordered Classification, that is useful for classification problems where only few labeled examples are available but a large test set needs to be classified. In many real-world classification problems, it is expensive and some times unfeasible to acquire a large training set, thus, traditional supervised learning algorithms often perform poorly. In our algorith...

متن کامل

Classification of Protein Localisation Patterns via Supervised Neural Network Learning

There are so many existing classification methods from diverse fields including statistics, machine learning and pattern recognition. New methods have been invented constantly that claim superior performance over classical methods. It has become increasingly difficult for practitioners to choose the right kind of the methods for their applications. So this paper is not about the suggestion of a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999